Naming issues found in v030.

New assembly workflow

1) taking v031 (cd-hit-est of 10 assemblies) then selecting only sequences >20,000 bp.

cgigas_alpha_v031 subset _20k.fa

2) downloading BAC
running through CD-HIT_EST

./cd-hit-est -i /Volumes/Bay4\ scratch/temp/Galaxy56-[Tabular-to-FASTA_on_data_55].fasta -o /Volumes/Bay4\ scratch/temp/CgigasBAC_cdhit -M 2500

total seq: 60
longest and shortest : 203422 and 84264
Total letters: 8610155
Sequences have been sorted

Approximated minimal memory consumption:
Sequence        : 8M
Buffer          : 1 X 2068M = 2068M
Table           : 1 X 16M = 16M
Miscellaneous   : 4M
Total           : 2098M

Table limit with the given memory limit:
Max number of representatives: 4194304
Max number of word counting entries: 50200207

comparing sequences from          0  to         60

       60  finished         53  clusters

Apprixmated maximum memory consumption: 2150M
writing new database
writing clustering information
program completed !

Total CPU time 148


3) adding 12 select genes

